
IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



In re Application of: 

FABRICE LESTIDEAU 

Application No.: 10/663,689 

Filed: September 17, 2003 

For: METHOD FOR TRACKING 
FACIAL FEATURES IN A 
VIDEO SEQUENCE 

Commissioner for Patents 
P.O. Box 1450 
Alexandria, VA 22313-1450 



Examiner: Unassigned 
Group Art Unit: Unassigned 

November 3, 2003 



Sir: 



SUBMISSION OF PRIORITY DOCUMENT 



In support of Applicant's claim for priority under 35 TJ.S.C. § 119, enclosed is a 



certified copy of the following Australian application: 

2002951473, filed September 18, 2002. 
Applicant's undersigned attorney may be reached in our Washington, D.C. office by 
telephone at (202) 530-1010. All correspondence should continue to be directed to our address 
given below. 

Respectfully submitted. 




Attorney for Applicants 
Brian L. Klock 
Registration No. 36,570 



FITZPATRICK, CELLA, HARPER & SCINTO 
30 Rockefeller Plaza 
New York, New York 10112-3801 
Facsimile: (212)218-2200 

BLK/lmj 




Patent Office 
Canberra 



I, JONNE YABSLEY, TEAM LEADER EXAMINATION SUPPORT AND 
SALES hereby certify that annexed is a true copy of the Provisional specification 
in connection with Application No. 2002951473 for a patent by CANON 
KABUSHIKI KAISHA as filed on 18 September 2002. 



WITNESS my hand this 




JONNE YABSLEY 

TEAM LEADER EXAMINATION 

SUPPORT AND SALES 



S&FRef: 593973 



ORIGINAL 

AUSTRALIA 
Patents Act 1990 



Me*od for Tracking Facial Features in Video Sequence 



Name and Address of Applicant; 

CanonKab«shiKiKaisha,incorporatedinIapan.of 30-2. Shi.o.aruKo 3 
chome. Ohta-ku. Tokyo, 146. Japan 



Name of Inventor: 

Fabrice Lestideau 



This invention is 



best described in the following statement: 



S805C 



1 - 

FEATURES INAVIDEO SEQUENCE 



METHOD FOR TRACKING FACIAL 



Field of the Invention 

«Uv to Video processing and, in particular, to 
The present invention relates generally to video p 

The term "facial features" refers 
,He tracing of facial features in a video sequence. The term 

particularly to the eyes and the mouth of a face. 

Background 

,.e,„.a«cn o. a v.eo se,ue.e ^ «.e — pulses 

mdustrv Face trackmg is an 
oft» cacounte^d in *e Unag. process.. S .ndusuy. 

Lponan.3spec.or*ei„«,r.->o.o,v>.ecse,u=„ce.».oa».».a.,r..^ 
important asp not enough in certain types of 

. ,eve. p^b.^.. Ho».v«. .ere>y »ac«n. *e face ,s often not eno gb 

«an,».e, >n ..e recogniHon app— , U,e pos.«ons of *e eyes a. 
::::::nee.e..en.ea.eeo..n.o.^»o^.^^ 

„he« U,e positions of eyes and the mouth are needed ,s an ey 
:le;sno.ani„terface.Ken.o,en.e„.oft.e.eaaist.sed.reontro.oU.rso 

.3 ;r.o..nonse.or..a.p.e. .nrther e.a„p,es ine.„de .rtna, reaiity app,iea„ons a.. 

Tnn..er o, tee...es .ve .een proposed Cor tr»«ns faoia. .eagres. X. 
... ..nn-on tee^,.e uses s^ coionr to detect and tracK a face se..e„. ^e e^ 
^ ^ n^nt. are then detected in.de .is s«n coionred segment. Bye ..^ 

. aetectio. ar. pa.^ — ' 

. 1 «te the eves Most Of these techniques rely on a frame by tr 

used to locate the eyes, ivxwoi 
to be highly unreliable under changing lighting cohdifons. 
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,t on. or more ^V^' " ^ "'^ 

,eappcarm*.vidoo. ^^^^ ^ 

Other prior art tectaiques use defonmble « 

, 5„meofthose techniques are able to 

—"—r.rjrrr: 

mck the features even in a side vie 

computation power. eye" e^ct. often seen in photograph 

Ye, another technique is based on .he redeye 

. d is caused by me reflection of light Orom .he retina of the eye. 
. „here a Hash was used and IS caused by ess faces the camera, to 

„ the reflecHon only occurs when the person more or less 
technique has only limited application. 

Summary 

, , . o.ect Of present — to v ove^.e. o. at .east 

. ..eUo.te,oneo..o.«eso.e.st...^^^^^ ^^^^^^ 
According to a first aspect of the invention, there ts pro 

• XA^ seouence, said method comprising the steps of: 
tracking facial features m a video sequence, 

• facial features for tracking in a first frame of saiQ 
(a) receiving positions of facial teatures 

. , J- « caJH facial features for tracking; 
dimensionalsegmen^mcludingsaidfacia. „„^i„onsiona, 

(c) identifying candidate facial feahires m a. leas, a 
segm^i. in said se,uence of associa.ed segments; and 
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veHtyins .«c. of saia .... ...» co.espo„a wiU. s... 

features for tracking. ^ 

.... ... ..-^ . » s..»oe o. .o.— ..e^- 

features for tracking, said method comprising the steps of: 

leaiures lui = .~~,nd two-dimensional 

(a) identifying candidate facia, features » a. ..as. a «cond tw 

^gmentinsaidsequenceofassociatedsegments; ftoia. 
0» veHf.„g Which of said candidate facia, features correspond wtth satd 

features for tracking; and . 

recovering .ost faci. features h, using .own geometric re.at.ons hetween 

facial features. ^ 

According to another aspect of the mventton, u,er. „ pro 
^piementinsanyoneoftheaforementionedmethc^ ^ 

According to another aspect of the mventton there ,s pr 
.rogram product including a computer readah.e mCum hav.g recorded thereon 
eompu.erprogramforimp,emen.i„sanyoneofthemethodsdescrihedahove. 

Other aspects of the invention are also disclosed. 

Brief Description of the Drawings 
one or more emhodiments of the present invenUon wiU now he descrihed wi«, 

„^renceto.hedrawings.inwhich. ^eia, feamres in a video 

Fig. 1 shows a flow diagram of a method of trackmg 



sequence; 



^..^showsaprogrammah-edcvlceforperformingtheoperationsofthemethod 

25 shown in Fig. 1; 
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3 SHOWS *. su.s.^ of *e 3D s.^.— s.ep in ...od shown in 

' R, 4 Shows fl-c sn^steps of fonn^g a suh-in-age. which is p^fonned in *. 

method shown in Fig. I ; 

, Shows .he suh.s.eps of .e„..r.n. can... facia, fca^u^. -h ,s ..so 

p.fo™eain*en,e«,o-show„inFis. ^^^^^^ 

Figs. 6A to 6G show flow diagrams of the suh step 

candidate facial features to fecial features; and 

,.howsasehen.atieofafecedisp.avedi„a.^e.thecoordi„ates.sten, 

.„ „^..ntherespec.ivein.aseccK>rdinatesofthe,c«eve.Hsh.e>«a«.n.out. 

Detailed Description Including Best Mode 
^ „fer«.ce is ntade in an, one or n.o« of the accompanving drawings to 
..,..orfeatures.whichhavethesan.erefe^cenun.e„.s,.hoses.epsan.orea.u»s 

for .e purposes of this des^ption .e same — , or operations, uniess the 

IS contrary intention appears. 

some portions of the description ^ roUows are expUcitl, or .mphc .1 
p^ted in terms of .goHthms and s^hoiic represent^ions of operations on da. 
Within a computer memor. These algorithmic descriptions and represent..ons .e * 
a bv those silled in the data processing arts to most effectively convey the 

„ substance of their work to others stalled m the art. An g 

<-..^.l<.!idinii to a desired result. The steps 
conceived to be a self-consisten, sequence of steps leading to 

a^tho^requiringphysicalmanipulaUonsofphysicalquantities. 

shows a flow diagram Of a method 100 Of tracking facial features ,n a 

.aeo s^uenc the fecial features being the eyes and the mouth of a human fece^ - 
„ caseofc.plan.tion.thesteps.fmeth<.lC.aredescribedwithreferenceto.hetr«tang 
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f . rinde face However, it U no. intended that the present 
of the Jaci.1 features of a smgle face. , as the described 

. .on be Unuted to tracking the &cial features of only one face, as the 
invention be iimucu t« c Hwice 200 

„«h..a,beusedonan.nun.berof faces. » --7^ ' "'^^ . 

performing the opera«ons of the nrethod t. described in de.> beio. oh 
, ^:arn.nab,edevice...a.bespecia..cons.ctedfor^re,uiredpurpos.. 

.thPr device selectively activated or 
a aeneral-ourpose computer or other device 

,„ suchasacamera250,akeyboard202andmous.moutpnt 

evice.4 Xhe computer ntoduie aOi typieai. ineiudes a. .east one processor untt 205. 

r, ...20. for e.a.p.e fbr^ed sen,.condue.r rando. aec^ ™en.o. 

. T/O i„terface2l3 for the keyboard202 and mouse203, and an 
interface 207. and an I/O interface 

u .r. 250 through connection 248. A storage device 209 
15 interface 208 for the camera 250 througn ,.,^.,,11 A CD- 

,„.aedand..ican,ine,udesaha.ddiskdrive2.0anda„opp,d.skdr..2,..C 

KOM drive2U is ..i* provided as a non— sou«e of data, Th 

en.s205 to2., of the con-puter n,odu>e20., typioaiiy — cate an 
components 205 to 213 „„entio«al mode of 

Wereonn^ted bus 204 and in a manner whrch results m 
„ op«a..onofU-eprosrammabledevice200knownU,d.osein*ere,evan.ar. 

^ method ma, be implemenled as so^ware. such . an applicanon program 

. • onn The application program may be stored 
executing within the programmable device 200. The appU 

. colputer readable medi.. — .e storage devices 20.. .he ap^ca. 
^ is read in» the computer .Tom the computer readable med.um^»^^» 
„ lledb,.eprocessor20S. ..erme.a.s»rageof deprogram and any da. fe.hed 
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possibly in concert with the hard disk drive 210. 

mpui piAc , . . interval n a two dimensional 

pi«, aa. of «.cvi<.«. sequence — a. ^hfta^c^rva, ». 

^,„.co.o.va.ues^.....n.icaU,«».an>™a.ese„..sucK..a..„«^^ 

o„e.es.o«sedev>cea09(Fi..). Cw.), 

Next, in step HO. the processor 205 receives as 

with imaee cooravnaics v^^ixx v'^-v 
in a first ftameofthe video s«iueaee.«.th mage c 

• ■ f ,he left eye right eye and mouth «spect.vely that 
co^sponding with the posmons of the left eye. ngh ^ . u,, 

,3 .vetohet^eUed. Pig. 7 sho«s a schematic of a face 7.0 displayed ,n a f^e, 
„„dina.e system used, an t. .espec^e .mage coo.inates ... of ^ - eye^ngh, 
and mout. In one imp.ementa.ion a u^ provides the image coo..na.es Cw* » 
. • .he mouse 203 fcr pointing and clicking on the eyes ».d 
the processor 205 by usmg the mouse 203 . ,,,,Fi^2) 
„o„.hrespecUvelyinthefirstftame.ha.isd.p.ayedonthedisp,aydev.ce214(P.2, 

. / \ ari. nrovided to method 
. , «forinn the image coordinates provmea 
20 In an alternative implementation, the image 

100 by an automatic face feature detection algorithm. 

Step 120 follows Wherein the processor 205 performs spatiotemporal (3D) 
.gmentationofthecolourvalues^...intheblocKofp^^^^ 

having homogeneous colour. Any 3U- 
set of three-dimensional segments {5.} having ^ 

,,„i2o In the preferred implementation, the 
25 segmentation algorithm may be used m step 120. In the p 
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M^fo^-SHah 3D-ses.c„.a«o„ a^oHth™ ,s useO, wH.H cp.,a«s on U.e co.our va,ues 
^.,„;onKeHoc.ofpixe>a«a.o«,«eeve^pixe,ta.h.>„oc.»o„eseg™e„.S,.THe 

^....^e.Have».U,«...o.of«.econU.o.3Dse^en.S,.A— 

/ % «f the facial features may occur is greatly 
the search area in which the coordinates of the facial 

reduced by the 3D segmentation. 

• that each colour valuc ^x.y.n) is 

During segmentation, an assumption is made that each 
. TH.n.oae,.=<..aenn.*esU.es...>a=a„po„U. 

^vanoc. Bach is de«n.a an unsown n.<.e. pa»ne.« veCor of 

^ c. wiU, each sute being a.un,»a . ^ vaiia over *e oontiguou, 3D se^en. S.. 
THe ain, of se.n,e.Mion is » iden^^ 3D sesn.«>. . and ..e para.e,e. «, 

for each segment 5,. 

^ „oae. vec^r of measurement over e.h segmen. S. is assumed .o be 

aH„earproi.e«onof«,ec.vec»rmodelparame.er u, for <ha. segment S,: 

0) 

Where ^....«; is by ^ n.a«x, whieh reU«s *e s..e of segment S. to the 
„ode, measurements ,.y., -Hereb, e„eapsu,a.ng the nature of .be pr..«nea m«.e.. 
. .e coiour video se^entation case. c=. and matrix S<.y.n> is O-e . b, c .dentit, 
matrix for all (xj'.n). 

a/ ;e eiihiect to a random error e(x,y,nj 
Each vector of actual colour values 4<x,y.rt) is subject 

such that 

(2) 
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Further, the error e(x,y,n) may be assumed to be drawn from a zero-mean normal 
(Gaussian) distribution with covariance K(x,y,n): 

e{x.y,n)-N{Q,K{x.yn)) (3) 
wherein Afx^y^n) is a c by c covariance matrix. Each component of the error 
5 e(x,y,n) is assumed to be independently and identically distributed, i.e.: 

A(x,y,n) = a\x,yn)Ic (4) 
where Ic is the c by c identity matrix. 

Variational segmentation requires that a cost function E be assigned to each 
possible segmentation. A partition into segments 5, may be compactly described by a 
10 binary function J(d) on the boundary elements, in which the value "1" is assigned to each 
boundary element d bordering a segment 5,-. This function J(d) is referred to as a 
boundary map. 

The cost function E used in the preferred implementation is one in which a 
model fitting error is balanced with an overall complexity of the model. The sum of the 
15 statistical residuals of each segment Si is used as the model fitting error. Combining 
Equations (1), (2), (3) and (4), the residual over segment S, as a function of the model 
parameters a,- is given by: 

^,(a,)= 1, [^{x,y,n)-a,n^ix,y,n)-a,] (5) 

The model complexity is simply the number of segment-bounding elements rf. 
20 Hence the overall cost functional E may be defined as 

£(r,J:A)=i:^,(a,)+AEy(rf) , (6) 

where the (non-negative) parameter X controls the relative importance of model 
fitting error and model complexity. The contribution of the model fitting error to the cost 
functional E encourages a proliferation of segments, while the model complexity 
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encourages few segments. The functional E must therefore balance the two components 
to achieve a reasonable result. The aim of variational segmentation is to find a 
minimising model measurement y and a minimising boundary map J(d) of the overall 
cost functional E, for a given value of parameter k, 
5 Note that if the segment boundaries d are given as a valid boundary map J(d), the 

minimising model parameters 5) over each segment Si may be found by minimising the 
segment residuals Ei. This may be evaluated using a simple weighted linear least squares 
calculation. Given this fact, any valid boundary map J(d) will fully and uniquely describe 
a segmentation. Therefore, the cost function E may be regarded as a function over the 
10 space of valid edge maps (J-space), whose minimisation yields an optimal segment 

partition J a for a given parameter X. The corresponding minimising model parameters 
5). may then be assumed to be those which minimise the residuals Ei over each segment 

Si. The corresponding minimum residuals for segment 5/ will hereafter be written as Ei . 

If parameter X is low, many boundaries are allowed, giving "fine" segmentation. 
15 As parameter X increases, the segmentation gets coarser. At one extreme, the optimal 

segment partition Jo , where the model complexity is completely discounted, is the trivial 
segmentation, in which every pixel constitutes its own segment 5,, and which gives zero 

model fitting error e. On the other hand, the optimal segment partition Jco, where the 
model fitting error e is completely discounted, is the null or empty segmentation in which 
20 the entire block is represented by a single segment 5,. Somewhere between these two 
extremes lies the segmentation J a which will appear ideal in that the segments 5, 
correspond to a semantically meaningful partition. 

To find an approximate solution to the variational segmentation problem, a 
segment merging strategy has been employed, wherein properties of neighbouring 
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segments 5/ and Sj are used to determine whether those segments come from the same 
model state, thus allowing the segments Si and Sj to be merged as a single segment Sij. 
The segment residual Ey also increases after any 2 neighbouring segments Si and i^- are 
merged. 

5 Knowing that the trivial segmentation is the optimal segment partition Jx for the 

smallest possible parameter X value of 0, in segment merging, each pixel in the block is 
initially labelled as its own unique segment 5,, with model parameters 5^ set to the colour 

values ^x,y,n). Adjacent segments Si and Sj are then compared using some similarity 
criterion and merged if they are sufficiently similar. In this way, small segments take 
10 shape, and are gradually built into larger ones. 

The partitions Jx before and after the merger differ only in the two segments 5/ 
and Sj. Accordingly, in determining the effect on the total cost functional E after such a 
merger, a computation may be confined to those segments Si and Sj. By examining 
Equations (5) and (6), a merging cost for the adjacent segment pair {Si,Sj} may be written 
15 as: 

_ Eij-jEi-^Ej) 
- US,) 

where l(bijj is the area of the common boundary between 3D segments 5/ and i^-. 
If the merging cost % is less than parameter X, the merge is allowed. 

A key to efficient segment growing is to compute the numerator of the merging 
20 cost as fast as possible. Firstly, Equation (5) is rewritten as: 

£j<aj) = (Fj^Hjajf(Fj^Hjaj) (8) 

where: 
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- Hj is an (vjc) by c matrix composed of the c by c identity matrices stacked on 
top of one another as ix,y,n) varies over segment Sj, with vj the number of pixels in 
segment Sj; and 

- f) is a column vector of length (vjc) composed of the individual colour value 
5 ^xy,n) vectors stacked on top of one another. 

By weighted least square theory, the minimising model parameter vector aj for 
the segment Sj is given by the mean of the colour value ^x,y,n) over segment Sj. The 
corresponding residual is given by 

Ej=(Fj.Hjajy(Fj.Hjaj) (9) 
10 When merging two segments Si and i^, the "merged" matrix Hij is obtained by 

concatenating matrix Hi with matrix Hj; likewise for matrix Fij. These facts may be used 
to show that the best fitting model parameter vector a-j for the merged segment Sij is given 
by: 

a,y= (10) 

15 The merged residual is given by: 

Ej+Ui-ajJL-ajV^ (11) 
The merging cost tij in Equation (7) may be computed as: 



IF 



- 112 V,Vj 



I -aj 



V, +v, 

r.;- = r-^ — - (12) 

from the model parameters of the segments 5, and to be merged. If the merge 
20 is allowed, Equations (10) give the model parameter of the merged segment Sij, 
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Note that under this strategy, only Equations (10) and (12) need to be applied 
throughout the merging process. Only the model parameters for each segment Si are 
therefore required as segmentation proceeds. Further, neither the original colour values 
^x,y,n) nor the model structure itself (i.e. the matrices 3(x,y,n)) are required. 

Usually, when the merging cost Ty exceeds a predetermined threshold X^top the 
segmentation stops. However, it is possible that the face containing the facial features to 
be tracked is still segmented in more than one segment Si. This is typically caused by 
challenging lighting conditions, or shadows. To resolve this problem, the segment 
merging continues at least until the three facial features to be tracked form part of the 
same segment 5/. The first time step 120 is performed, the three facial features to be 
tracked are those identified in step 110 and occur in firame 1 . In subsequent iterations, the 
three facial features are those verified as being the facial features in frame /, being the 
oldest fi*ame in the block of pixel data. 

Fig. 3 shows the sub-steps of the 3D segmentation step 120. The segmentation 
step 120 starts in sub-step 304 where the processor 205 sets the model parameters 
aj{x,y,n) to the colour values ^x,y,n) of each pixel in the block of fi-ames. Hence, the 

segmentation starts with the trivial segmentation where each pixel forms its own segment 
5/. The processor 205 then determines in sub-step 306 all adjacent segment pairs S, and 
Sj, and computes the merging cost tij according to Equation (12) for each of the 
boundaries between adjacent segment pairs S, and Sj. The boundaries with merging cost 
Ty are then inserted by the processor 205 in sub-step 308 into a priority queue P in priority 
order. The priority queue P is typically stored on the storing device 209 or memory 206. 

The processor 205 retrieves the first entry fi-om the priority queue P(l) and 
merges the corresponding segment pair Si and .5^ (i.e. the segment pair S/ and Sj with the 
lowest merging cost Tij) to form a new segment Sij in sub-step 310. 
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In sub-step 314 the processor 205 identifies all segments 5/ adjoining either of 
the merged segments S, and Sj and merges any duplicate boundaries, adding their areas. 
Sub-step 318 follows where the processor 205 calculates a new merging cost tyj for each 
boundary between adjacent segments Sy and Si. The new merging costs Ti/j effectively 
reorder the priority queue P. 

In sub-step 322 the processor 205 determines whether the merging cost vy at the 
top of the priority queue P (entry P(l)) has a value greater than the predetermined 
threshold A^stop. If the merging cost t// of entry P(l) has a value greater than the 
predetermined threshold Xsiop, then the processor 205 determines in sub-step 325 whether 
the known image coordinates ixk^k)^ that corresponds with the positions of the facial 
features that have to be tracked, are included in a single segment Si. If all three positions 
of the facial features are in the same segment 5/, then the segmentation step 120 ends. 

If the conditions of either of sub-step 322 or sub-step 325 are not met, then 
segment merging must continue, and control is returned to sub-step 310 from where sub- 
steps 310 to 325 are repeated to merge the segment pair Si and iS^- with the lowest merging 
cost Ty. 

Referring again to Fig. 1, with the set of 3D segments {Si} formed in step 120, 
and with the image coordinates ixk^k)t known for the oldest frame / in the block of video 
data, step 130 follows where the processor 205 forms a sub-image, thereby restricting the 
search area for identifying candidate facial features. Fig. 4 shows step 130 in more detail. 

Step 130 starts in sub-step 410 where the processor 205 "slices through" the 3D 
segments 5/ at frame interval /+1, to produce 2D segments 5/^,. Each 2D segment j/^, 

incorporates the pixels at frame interval /+1 that are included in the corresponding 3D 
segment Si. 
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In sub-step 41 1 the processor 205 identifies the 2D segment 5,'^, containing the 
facial features, that being the 2D segment 5/^, being associated with the same 3D segment 
Si as the 2D segment in the previous frame containing the known coordinates ixk^/c)t of 
the facial features being tracked. The first time step 130 is performed, the known 
5 coordinates will correspond with those received in step 110. In subsequent 

iterations, the three facial features are those verified as being the facial features in fi-ame t. 

Sub-step 412 follows where the processor 205 determines whether the area of the 
2D segment is equal to or above a minimum area /w^. In the preferred 
implementation the minimum area ntA is 1500 pixels. If the area of segment 5,'^, is 
10 smaller than the minimum area /w^, then features are too small to be tracked and method 
100 ends in step 490. 

If the area of the 2D segment s'^^^ is equal to or above a minimum area fha, then 
step 130 continues in sub-step 415 where the processor 205 creates boundary box sub- 
image 6/+i(jcj/) fi-om the colour values ^x,y,t+l) at frame interval /+1. The boundary box 
15 sub-image bi+](x^) is a rectangular shaped image formed around the segment sl^^ . 

The boundary box sub-image bt+\{x^) is re-scaled in sub-step 420 such that the 
area of the segment sj^^ in the re-scaled boundary box sub-image b^^^ {x,y) is equal to the 

predefined minimum area m^i. 

Problems typically associated with displacement in the axis of the camera 250 
20 are overcome by resizing the boundary box sub-image bi+\(xy) to a fixed size. 
Furthermore, the resizing also cause the candidate facial features, which are identified in 
step 140 that follows, to be of similar size inside the re-scaled boundary box sub-image 
^ those identified in the previous frame, thereby aiding the correspondence 
between frames. 
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Sub-step 425 follows where the processor 205 crops the re-scaled boundary box 
sub-image bl^X^.y) to form sub-image At this point the previous feature 

coordinates (x^j/jt), are scaled to the size of the sub-image b^^X^^y)* ^o that one unit 
equals one pixel in sub-image b^+X^yy)* With image coordinates ix20^2)i corresponding 
5 with the position of the right eye then, if the right boundary of the sub-image bl^^{x,y) is 
greater than a distance Au from X2ty the left bound of sub-image b"^y{xyy) is set to (^2/ - 
Au\ else the right bound of sub-image b^^X^.y) remains that of sub-image 

Similarly, with image coordinates (xi^yi)/ corresponding with the position of the 
left eye then, if the left boundary of the sub-image b^^X^.y) is greater than the distance 

10 Au from xuj the left bound of sub-image b^^ {x, y) is set to (xu + Au), else the left bound 
of sub-image 6^+, (x^y) remains that of sub-image 6/^, (x^y) . 

If the upper boimdary of the sub-image bl^y{x,y) is greater than the distance Au 
from the upper one of yn and y2t , then the upper bound of sub-image b^^i{x,y) is set to 
(max(y]t ^2t ) + else the upper bound of sub-image b^^X^.y) remains that of sub- 

15 image b\^X^,y). 

Finally, with image coordinates {x^^y^jt corresponding with the position of the 
mouth, if the lower boundary of the sub-image b]^X^,y) is greater than the distance Au 

from>^3/, then the lower bound of sub-image bl^X^^v) 0^3/ - else the lower 

bound of sub-image b"^Xx,y) remains that of sub-image b\^X^,y). 
20 In the preferred implementation the distance Au is 5. After cropping, the sub- 

image b"^X^*y) also stored in memory 206. Sub-image b^^^X^^y) '^ow defines the 
search area for identifying candidate facial features in step 140 that follows step 130. At 



593973.doc 



- 16- 

this point, the (scaled) previous feature coordinates (Xk0^k)i are referred to a new origin, 
namely the centroid of the current face segment j)^, . All subsequent coordinates are with 
respect to this origin, unless otherwise stated. Accordingly, method 100 (Fig, 1) 
continues to step 140 where the processor 205 identifies candidate facial features, and 

5 extracts information from each of the candidate facial features in sub-image frf^, (x, y) . 

Fig. 5 shows a flow diagram of the sub-steps of step 140. A first part of step 140 
extracts a facial feature map sub-image b^^x.y). The facial feature 

map bZi{x,y) is a binary image, with the value of its pixels set to "1" when those pixels 
are found to be part of a candidate facial feature and set to "0" otherwise. A second part 

10 of step 140 then extracts information from each of the candidate facial features, that 
information including the positions of the candidate facial features inside the facial 
feature map 6,'^,(x,>^), and shape characteristics of the candidate facial features. 

The extraction of a facial feature map b^^X^.y) from the sub-image b^^^x.y) is 
based on two characteristics of facial features. A first is that such features give small 

15 edges after edge detection, and a second is that they are darker than the rest of the face. 
Using these characteristics, two initial facial feature maps/i(x^y) andfiix^d are formed 
from the sub-image b^^X^.y), from which the facial feature map is then 

formed. 

Step 140 starts in sub-step 505 where the processor 205 forms a first initial facial 
20 feature map/i(xj;) by applying an edge detection algorithm to the sub-image 

Any edge detection technique may be used, but in the preferred implementation the 
Laplacian edge detection technique is used. A mask r is first applied to sub-image 
b^A^yy) to form a gradient map, followed by applying a threshold to the gradient map. 
In the preferred implementation the mask i"is: 
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0.0276655 
-0.224751 
1.65895 

r= -2.92373 (13) 
1.65895 
-0.224751 
0.0276655 

and the threshold is set such that 15% of pixels in the initial first facial feature 
map f\{x^) are set to "1", those pixels corresponding to the pixels in the gradient map 
having the highest gradient. 
5 The second initial facial feature map^(jcj/) is formed by applying a threshold in 

sub-step 510 to sub-image bl^X^.y), giving a value of "1" to pixels with intensity values 
lower than a predetermined value and a value of "0" to pixels with intensity values above 
the predetermined value. Again, the threshold is set such that 15% of pixels in the second 
initial facial feature map^(xj/) are set to "1". 

10 The facial feature map b^^X^^y) *s then formed by the processor 205 in sub-step 

515 by taking the result of applying an "AND*' function on the two initial facial feature 
maps /iCxj') dcadfiixy^. Hence, if corresponding pixels in the two initial facial feature 
maps f\ix^) and^(jc»y) both have the value "1", then that pixel has the value "1" in the 
facial feature map b^^Xxyy). 

15 The processor 205 next, in sub-step 520, identifies candidate facial features and 

then extracts the position and characteristics of each identified candidate facial feature. A 
candidate facial feature is identified as a cluster of pixels in the facial feature map 
K^X^^y) having values of "1". If the shape of the 2D segment 5/^, identified as 
containing the face does not change drastically, the position (jc,-, yD' of each facial feature 

20 should be independent of the position of the face in the frame. The characteristics of the 
candidate facial feature include its area y4'/, which is the number of pixels in the cluster, 
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and the eccentricity of the candidate facial feature, which is the ratio of the width 
against the height of the candidate facial feature. 

Referring again to Fig. 1, with candidate facial features identified and 
information extracted, the processor 205 identifies in step 150 which of the candidate 
facial features possibly correspond with the facial features for tracking identified in step 
110, and in particular, the facial features verified in a previous fi-ame as corresponding 
with the facial features for tracking identified in step 1 10. In order to identify whether the 
i-th candidate facial feature possibly corresponds with the facial features for tracking, the 
processor 205 calculates the Euclidean distance dij between the /-th candidate facial 
feature in firame r-f-l, and the y-th facial feature in firame t as follows: 



If the Euclidean distance dij between the i-th candidate facial feature in fi*anie 
/+1, and the j-Xh facial feature in frame t is below a threshold, then a characteristic 
similarity measure is calculated using the respective areas A / and Ajy and eccentricities 
and €/. 



If the characteristic similarity measure Ay is also below a threshold, then the i-th 
candidate facial feature has a possible correspondence with the y-th facial feature, and an 
entry / is added to a list Lj. After processing all of the candidate facial features in this 
manner, the list Lj contains the identifier of each of the candidate facial features that may 
possibly correspond to the y-th facial feature in the previous fi-ame. 

An empty list Lj means that no candidate facial feature corresponding to the y-th 
facial feature in the previous frame has been identified. Two events may cause list Lj to 
be empty. Firstly, it may be that the y-th facial feature was not detected in the previous 




(14) 




(15) 
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frame. In this event the Euclidean distance would be higher than the threshold, and an 
attempt is made in step ISS that follows to recover such a facial feature in case it 
reappeared. The second event that may cause list Lj to be empty is when the y-th facial 
feature was detected in the previous frame, but then disappeared in the current frame. 

Method 100 (Fig. 1) now proceeds to step 155 where the processor assigns one 
of the possible candidate facial features to the y-th facial feature, and attempts to recover 
facial features where appropriate. Let the lists LuLi and Z3 relate to the left eye, right eye 
and mouth respectively. A total of 8 possible scenarios may arise, which are: 

1: Lists L\y L2 and Z.3 are empty, which means that no corresponding facial 
features were identified; 

2: Lists L2 and Z3 are empty, which means that the left eye has a corresponding 
candidate facial feature; 

3: Lists L\ and are empty, which means that the right eye has a corresponding 
candidate facial feature; 

4: Lists L\ and L2 are empty, which means that the mouth has a corresponding 
candidate facial feature; 

5: List L2 is empty, which means that the left eye and the mouth have 
corresponding candidate facial features; 

6: List Li is empty, which means that the right eye and the mouth have 
corresponding candidate facial features; 

7: List L3 is empty, which means the eyes have corresponding candidate facial 
features; and 

8: Lists L\y L2 and have entries, which means that both eyes and the mouth 
have corresponding candidate facial features. 
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In the case when scenario 1 exists, no corresponding facial features were 
identified and the method 100 terminates as the features being tracked have been lost. 

In the case when scenario 2 exists, the left eye has at least one corresponding 
candidate facial feature. Fig. 6A shows a flow diagram of the sub-steps of step 155 in the 

5 case when scenario 2 exists. Step 155 starts in sub-step 605 where the possible candidate 
facial feature in list L\ with the smallest characteristic similarity measure Ai is assigned 
as corresponding to the left eye. In sub-step 607 the processor 205 then saves the position 
{xi, yi)' of the left eye as coordinates (xi, yi Vi, together with its area A'i as area Ai and its 
eccentricity as eccentricity ^i. 

10 The processor 205 then determines in sub-step 610 whether the coordinates 

(x20^2)t of the right eye were known in the previous fi-ame. If the coordinates (x20^2)t of the 
right eye were not known, then the right eye was not detected in the previous frame, but 
may have reappeared in the current frame. Accordiiigly, sub-step 615 attempts to recover 
the right eye by detemiining whether any candidate facial features have coordinates 

15 (xi^yi)' satisfying the following conditions, starting with the candidate facial feature with 
the lowest characteristic similarity measure Dij: 

-3<ty; -yuM))<3 AND 5<(x;-x,,,J<l 1 (16) 
The processor 205 then determines in sub-step 620 whether a facial feature 
corresponding to the right eye was found in sub-step 615. If a facial feature 
20 corresponding to the right eye was found then, in sub-step 622 the processor 205 saves 
the position (x/, yiY of the candidate facial feature corresponding to the right eye as 
coordinates (xz, ^2)1+11 together with its area A'i as area A2 and its eccentricity sf,- as 
eccentricity ^i. 

The processor 205 determines in sub-step 625 whether the coordinates (^3 j^a)/ of 
25 the mouth were known in the previous frame. If the coordinates (xa^ya), of the mouth 
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were not known, then the mouth was not detected in the previous frame, but may have 
reappeared in the current frame. Accordingly, sub-step 630 attempts to recover the mouth 
by creating a triangle defined by the coordinates of the two eyes ix\^\)t+\ and (x20'2Vi» 
and any of the candidate features (x,, yi)\ starting with the candidate facial feature with 
5 the lowest characteristic similarity measure Z),y. 

Let aj and a2 be the angles in radians of the triangle at the left eye and the right 
eye respectively. Also, let S be the distance from the centre of the eye-eye line to the 
candidate facial feature under consideration. If the following conditions are met: 

(a9<a,<1.45) AND (0.9<a2<1.45) AND (10<^15) (17) 
10 then the candidate facial feature under consideration corresponds with the mouth. 

If it is determined in sub-step 610 that the coordinates (x20^2)/ of the right eye 
were known in the previous frame, or if it is determined in sub-step 620 that a facial 
feature corresponding to the right eye was not found, then the right eye disappeared in the 
current frame. Step 155 proceeds to sub-step 640 where the processor 205 sets the 
15 coordinates (x2j^2)t+\ to "unknown". The processor 205 then determines in sub-step 645 
whether the coordinates (xjj^a), of the mouth were known in the previous frame. If the 
coordinates (X3^y3)/ of the mouth were not known, then the mouth was not detected in the 
previous frame, but may have reappeared in the current frame. Accordingly, sub-step 650 
attempts to recover the mouth by determining whether any candidate facial features have 
20 coordinates (jc/, y,)' satisfying the following conditions, starting again with the candidate 
facial feature with the lowest characteristic similarity measure Ay: 

-7<U(...) -^;)<-2 AND 7<(y^,,,> -3';)<14 (18) 
Following sub-step 630 or 650, the processor 205 then determines in sub-step 
635 whether a facial feature corresponding to the mouth was found in sub-step 630 or 
25 650. If a facial feature corresponding to the mouth was found then, in sub-step 637 the 
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processor 205 saves the position (jc,, yi)* of the candidate facial feature corresponding to 
the mouth as coordinates (jC3, >^3)/+i, together with its area A'i as area and its eccentricity 
as eccentricity s^. Step 155 also ends. 

If it is determined in either of sub-step 625 or 645 that the coordinates (xj j/3)/ of 
the mouth were known in the previous frame, then the mouth disappeared in the current 
frame and the processor 205 sets the coordinates (x3,y3)/+i to "unknown" in sub-step 655 
before step 155 ends. 

In the case when scenario 3 exists, the right eye has at least one corresponding 
candidate facial feature. Fig. 6B shows a flow diagram of the sub-steps of step 155 in the 
case when scenario 2 exists. Step 155 starts in sub-step 705 where the possible candidate 
facial feature in list L2 with the smallest characteristic similarity measure Da is assigned 
as corresponding to the right eye. In sub-step 707 the processor 205 then saves the 
position {xi^yiy of the right eye as coordinates (;c2,>'2)i+i, together with its aresiA'i as area 
A2 and its eccentricity as eccentricity Sz, 

The processor 205 then determines in sub-step 710 whether the coordinates 
(x\^\)t of the left eye were known in the previous frame. If the coordinates (xij/i)/ of the 
left eye were not known, then the left eye was not detected in the previous frame, but may 
have reappeared in the current frame. Accordingly, sub-step 715 attempts to recover the 
left eye by determining whether any candidate facial features have coordinates (x/jj;,)' 
satisfying the following conditions, starting with the candidate facial feature with the 
lowest characteristic similarity measure Dij: 

-3<ly; -JK.o.oH AND -ll<(x; -x,(„„)<-5 (19) 
The processor 205 then determines in sub-step 720 whether a facial feature 
corresponding to the left eye was found in sub-step 715. If a facial feature corresponding 
to the left eye was found then, in sub-step 722 the processor 205 saves the position (;c/, yi)' 
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of the candidate facial feature corresponding to the left eye as coordinates {x\, y\)i+\y 
together with its area A'i as area >4i and its eccentricity as eccentricity ^i. 

The processor 205 determines in sub-step 725 whether the coordinates (xy^^)t of 
the mouth were known in the previous frame. If the coordinates (x3 of the mouth 
5 were not known, then the mouth was not detected in the previous frame, but may have 
reappeared in the current frame. Accordingly, sub-step 730 attempts to recover the mouth 
by creating a triangle defined by the coordinates of the two eyes (x\j^\)t+\ and ix2^2)i+\, 
and any of the candidate features (x„ ;;/)', starting with the candidate facial feature with 
the lowest characteristic similarity measure Dy. If the conditions of Equation (17) are 

10 met, then the candidate facial feature under consideration corresponds with the mouth. 

If it is determined in sub-step 710 that the coordinates (xi j^i), of the left eye were 
known in the previous frame, or if it is determined in sub-step 720 that a facial feature 
corresponding to the left eye was not found, then the left eye disappeared in the current 
frame. Step 155 proceeds to sub-step 740 where the processor 205 sets the coordinates 

15 ix\^\)t+i to "unknown". The processor 205 then determines in sub-step 745 whether the 
coordinates ix2^2)i of the mouth were known in the previous frame. If the coordinates 
(^aO^a)/ of the mouth were not known, then the mouth was not detected in the previous 
frame, but may have reappeared in the current frame. Accordingly, sub-step 750 attempts 
to recover the mouth by determining whether any candidate facial features have 

20 coordinates (x„ satisfying the following conditions, starting again with the candidate 
facial feature with the lowest characteristic similarity measure Ay: 

;c;)<7 AND 7<i^2u,,y -:p;)<14 (20) 
Following sub-step 730 or 750, the processor 205 then determines in sub-step 
735 whether a facial feature corresponding to the mouth was found in sub-step 730 or 
25 750. If a facial feature corresponding to the mouth was found then, in sub-step 737 the 
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processor 205 saves the position (;c/, 3;,)' of the candidate facial feature corresponding to 
the mouth as coordinates (xa, y3)t+\ , together with its area A',- as area and its eccentricity 
as eccentricity Step 155 also ends. 

If it is determined in either of sub-step 725 or 745 that the coordinates (xy^^i)/ of 

5 the mouth were known in the previous frame, then the mouth disappeared in the current 
frame and the processor 205 sets the coordinates ix3j^3)t+\ to "unknown" in sub-step 755 
before step 155 ends. 

In the case when scenario 4 exists, the mouth has at least one corresponding 
candidate facial feature. Fig. 6C shows a flow diagram of the sub-steps of step 155 in the 

10 case when scenario 4 exists. Step 155 starts in sub-step 805 where the possible candidate 
facial feature in list Lj with the smallest characteristic similarity measure A3 is assigned 
as corresponding to the mouth. In sub-step 807 the processor 205 then saves the position 
(xh yd' of the mouth as coordinates {x^y ^3)/+!, together with its area A'i as area Ay and its 
eccentricity £'/ as eccentricity 

15 The processor 205 then determines in sub-step 810 whether the coordinates 

of the left eye were known in the previous frame. If the coordinates {x\^\)t of the 
left eye were not known, then the left eye was not detected in the previous frame, but may 
have reappeared in the current frame. Accordingly, sub-step 815 attempts to recover the 
left eye by determining whether any candidate facial features have coordinates {xi.yiy 

20 satisfying the following conditions, starting with the candidate facial feature with the 
lowest characteristic similarity measure Ay' 

7<(y; -;^3(...))<14 AND -7<(x; J<-2 (21) 
The processor 205 then determines in sub-step 820 whether a facial feature 
corresponding to the left eye was found in sub-step 815. If a facial feature corresponding 
25 to the left eye was found then, in sub-step 822 the processor 205 saves the position (x/, yi)* 
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of the candidate facial feature corresponding to the left eye as coordinates (x\y y\)t+{y 
together with its aresiA'i as area A\ and its eccentricity as eccentricity €\. 

The processor 205 determines in sub-step 825 whether the coordinates {xio^ijt of 
the right eye were known in the previous frame. If the coordinates (X2^y2)/ of the right eye 
were not known, then the right eye was not detected in the previous frame, but may have 
reappeared in the current frame. Accordingly, sub-step 830 attempts to recover the right 
eye by creating a triangle defined by the coordinates of the left eye (jci j/iVi, the mouth 
(jC3j/3)/+i, and any of the candidate features (x/, >//)', starting with the candidate facial 
feature with the lowest characteristic similarity measure Dij. If the conditions of Equation 
(17) are met, then the candidate facial feature under consideration corresponds with the 
right eye. 

If it is determined in sub-step 810 that the coordinates (xj j/j)/ of the left eye were 
known in the previous frame, or if it is deicnnined in sub-step 820 that a facial feature 
corresponding to the left eye was not found, then the left eye disappeared in the current 
frame. Step 155 proceeds to sub-step 840 where the processor 205 sets the coordinates 
(^i*yi)f+i to "unknown". The processor 205 then determines in sub-step 845 whether the 
coordinates (x20'2)t of the right eye were known in the previous frame. If the coordinates 
(■^2^X2)/ of the right eye were not known, then the right eye was not detected in the 
previous frame, but may have reappeared in the current frame. Accordingly, sub-step 850 
attempts to recover the right eye by determining whether any candidate facial features 
have coordinates (x„ j;,)' satisfying the following conditions, starting with the candidate 
facial feature with the lowest characteristic similarity measure Di/. 

2<(^2c,.„ -x;)<7 AND 7<(y: -y,,,^,,)<l4 (22) 
Following sub-step 830 or 850, the processor 205 then determines in sub-step 
835 whether a facial feature corresponding to the right eye was found in sub-step 830 or 
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850. If a facial feature corresponding to the right eye was found then, in sub-step 837 the 
processor 205 saves the position (jc/, yiy of the candidate facial feature corresponding to 
the right as coordinates (x2, >^2)/+i, together with its area A*i as area A2 and its eccentricity 
as eccentricity €2. Step 155 also ends. 

If it is determined in either of sub-step 825 or 845 that the coordinates (^20^2)/ of 
the right eye were known in the previous frame, then the right eye disappeared in the 
current frame and the processor 205 sets the coordinates {X2yy2)t¥\ to "unknown" in sub- 
step 855 before step 155 ends. 

In the case when scenario 5 exists, the left eye and the mouth have at least one 
corresponding candidate facial feature. Fig. 6D shows a flow diagram of the sub-steps of 
step 155 in the case when scenario 5 exists. Step 155 then starts in sub-step 902 where 
the processor 205 forms a vector (jc3-xij/3-j;i)/ from the left eye to the mouth in the 
previous frame. A vector is also formed in sub-step 904 from a possible candidate facial 
feature in list £1 to a possible candidate facial feature in list 

The processor 205 then calculates in sub-step 908 the Euclidean distance 
between the vectors formed in sub-steps 902 and 904 respectively. The processor 205 
then determines in sub-step 908 whether the Euclidean distance is below a threshold. If 
the Euclidean distance is not below the threshold, then processor 205 determines in sub- 
step 909 whether another vector .can be formed in sub-step 904 with a different 
combination of possible candidate facial features in lists L\ and L3. If another vector can 
be formed, then control is passed to sub-step 904 from where sub-steps 904 to 908 are 
repeated. 

If another vector can not be formed with a different combination of possible 
candidate facial features in lists L\ and X3 then, in sub-step 911, the possible candidate 
facial feature in lists L\ and Zr3 with the smallest Euclidean distance da or dn (calculated 
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in step 150) is assigned as corresponding to the corresponding feature by saving its 
information, while the coordinates of the other two features are set to "unknown". Step 
155 then ends. 

If the processor 205 determines in sub-step 908 that the Euclidean distance is 

5 below the threshold, then processor 205 saves the information of the two possible 
candidate facial features forming the vector as the left eye and mouth respectively. 

The processor 205 then determines in sub-step 912 whether the coordinates 
{x20^2)t of the right eye were known in the previous frame. If the coordinates (X2^y2)r of the 
right eye were not known, then the right eye was not detected in the previous frame, but 

10 may have reappeared in the current frame. Accordingly, sub-step 914 attempts to recover 
the right eye by creating a triangle defined by the coordinates of the left eye (xi^yiVi, the 
mouth (jC30/3)i+i, and any of the candidate features (x„ 3//)', starting with the candidate 
facial feature with the lowest characteristic similarity measure Dij. If the conditions of 
Equation (17) are met, then the candidate facial feature under consideration corresponds 

1 5 with the right eye. 

The processor 205 then determines in sub-step 916 whether a facial feature 
corresponding to the right eye was found in sub-step 914. If a facial feature 
corresponding to the right eye was found then, in sub-step 918 the processor 205 saves 
the position (x/, ;;/)' of the candidate facial feature corresponding to the right eye as 

20 coordinates (jc2, 72)/+!, together with its area A'l as area A2 and its eccentricity as 
eccentricity £i. Step 155 also ends. 

If it is determined in sub-step 912 that the coordinates {x2d^i)t of the right eye 
were known in the previous frame, or if it is determined in sub-step 916 that a facial 
feature corresponding to the right eye was not found, then the right eye disappeared in the 
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current frame. Step 155 proceeds to sub-step 920 where the processor 205 sets the 
coordinates (;c2^y2)/+i to "unknown". Step 155 also ends after sub-step 920. 

In the case when scenario 6 exists, the right eye and the mouth have at least one 
corresponding candidate facial feature. Fig. 6E shows a flow diagram of the sub-steps of 
5 step 155 in the case when scenario 6 exists. Step 155 then starts in sub-step 932 where 
the processor 205 forms a vector ix3'X2j^2-y2)t from the right eye to the mouth in the 
previous frame. A vector is also formed in sub-step 934 from a possible candidate facial 
feature in list L2 to a possible candidate facial feature in list L3. 

The processor 205 then calculates in sub-step 938 the Euclidean distance 

10 between the vectors formed in sub-steps 932 and 934 respectively. The processor 205 
then determines in sub-step 938 whether the Euclidean distance is below a threshold. If 
the Euclidean distance is not below the threshold, then processor 205 determines in sub- 
step 939 v/hether another vector can be formed in sub-step 934 with a different 
combination of possible candidate facial features in lists L2 and £3. If another vector can 

15 be fomied, then control is passed to sub-step 934 from where sub-steps 934 to 938 are 
repeated. 

If another vector can not be formed with a different combination of possible 
candidate facial features in lists L2 and £3 then, in sub-step 941, the possible candidate 
facial feature in lists L2 and L3 with the smallest Euclidean distance da or dty (calculated 
20 in step 150) is assigned as corresponding to the corresponding feature by saving its 
information, while the coordinates of the other two features are set to "unknown". Step 
155 then ends. 

If the processor 205 determines in sub-step 938 that the Euclidean distance is 
below the threshold, then processor 205 saves the information of the two possible 
25 candidate facial feature forming the vector as the right eye and mouth respectively. 
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The processor 205 then determines in sub-step 942 whether the coordinates 
{x\^\)t of the left eye were known in the previous frame. If the coordinates (xi j^i)/ of the 
left eye were not known, then the left eye was not detected in the previous frame, but may 
have reappeared in the current frame. Accordingly, sub-step 944 attempts to recover the 
5 left eye by creating a triangle defined by the coordinates of the right eye (X2j^2)t+\j the 
mouth (A:3j'3)r+i, and any of the candidate features (xi, y/)', starting with the candidate 
facial feature with the lowest characteristic similarity measure Ay. If the conditions of 
Equation (17) are met, then the candidate facial feature under consideration coiresponds 
with the left eye. 

10 The processor 205 then determines in sub-step 946 whether a facial feature 

corresponding to the left eye was found in sub-step 944. If a facial feature corresponding 
to the left eye was found then, in sub-step 948 the processor 205 saves the position (x,-, yi)' 
of the candidate facial feature corresponding to the left eye as coordinates (x\, >i)/+i, 
together with its area A'i as area A\ and its eccentricity as eccentricity S\. Step 155 also 

15 ends. 

If it is determined in sub-step 942 that the coordinates (x\j;\)g of the left eye were 
known in the previous frame, or if it is determined in sub-step 946 that a facial feature 
corresponding to the left eye was not found, then the left eye disappeared in the current 
frame. Step 155 proceeds to sub-step 950 where the processor 205 sets the coordinates 
20 (xi to "unknown". Step 155 also ends after sub-step 950. 

In the case when scenario 7 exists, each of the eyes has at least one 
corresponding candidate facial feature. Fig. 6F shows a flow diagram of the sub-steps of 
step 155 in the case when scenario 7 exists. Step 155 for this scenario starts in sub-step 
962 where the processor 205 forms a vector ix2'X\^'y\)i from the left eye to the right eye 
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in the previous frame. A vector is also formed in sub-step 964 from a possible candidate 
facial feature in list L\ to a possible candidate facial feature in list L2. 

The processor 20S then calculates in sub-step 966 the Euclidean distance 
between the vectors formed in sub-steps 962 and 964 respectively. The processor 205 
then determines in sub-step 968 whether the Euclidean distance is below a threshold. If 
the Euclidean distance is not below the threshold, then processor 205 determines in sub- 
step 969 whether another vector can be formed in sub-step 964 with a different 
combination of possible candidate facial features in lists L\ and £2- If another vector can 
be formed, then control is passed to sub-step 964 from where sub-steps 964 to 968 are 
repeated. 

If another vector can not be formed with a different combination of possible 
candidate facial features in lists L\ and £2 then, in sub-step 971, the possible candidate 
facial feature in lists L\ and L2 with the smallest Euclidean distance di\ or da (calculated 
in step 150) is assigned as corresponding to the corresponding feature by saving its 
information, while the coordinates of the other two features are set to "unknown". Step 
155 then ends. 

If the processor 205 determines in sub-step 968 that the Euclidean distance is 
below the threshold, then processor 205 saves the information of the two possible 
candidate facial features forming the vector as the left eye and right eye respectively. 

The processor 205 then determines in sub-step 972 whether the coordinates 
(x^^z)t of the mouth were known in the previous frame. If the coordinates (jc3 of the 
mouth were not known, then the mouth was not detected in the previous frame, but may 
have reappeared in the current frame. Accordingly, sub-step 974 attempts to recover the 
mouth by creating a triangle defined by the coordinates of the left eye (jci j;i)m.i, the right 
eye (x20^2)/+i> and any of the candidate features (x,,^/)', starting with the candidate facial 
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feature with the lowest characteristic similarity measure A>. If the conditions of Equation 
(17) are met, then the candidate facial feature under consideration corresponds with the 
mouth. 

The processor 205 then determines in sub-step 976 whether a facial feature 
5 corresponding to the mouth was found in sub-step 974. If a facial feature corresponding 
to the mouth was found then, in sub-step 978 the processor 205 saves the position (x/, yif 
of the candidate facial feature corresponding to the mouth as coordinates {xi^ yi)t+u 
together with its area A*i as area A-^ and its eccentricity as eccentricity e^. Step 155 also 
ends. 

10 If it is determined in sub-step 972 that the coordinates (xsj'a)/ of the mouth were 

known in the previous frame, or if it is determined in sub-step 976 that a facial feature 
corresponding to the mouth was not found, then the mouth disappeared in the current 
frame. Step 155 proceeds to sub-step 980 where the processor 205 sets the coordinates 
(^3j'3)f+i to "unknown". Step 155 also ends after sub-step 920. 

15 Finally, in the case when scenario 8 exists, each of the eyes and the mouth has at 

least one corresponding candidate facial feature. Fig. 6G shows a flow diagram of the 
sub-steps of step 155 in the case when scenario 8 exists. In this case step 155 starts in 
sub-step 1002 where the processor 205 forms a vector (x2-Jcio^2-yi)/ from the left eye to the 
right eye in the previous frame. A vector is also formed in sub-step 1004 from a possible 

20 candidate facial feature in list Ii to a possible candidate facial feature in list L2. 

The processor 205 then calculates in sub-step 1008 the Euclidean distance 
between the vectors formed in sub-steps 1002 and 1004 respectively. The processor 205 
then determines in sub-step 1008 whether the Euclidean distance is below a threshold. If 
the Euclidean distance is not below the threshold, then processor 205 determines in sub- 

25 step 1009 whether another vector can be formed in sub-step 1004 with a different 
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combination of possible candidate facial features in lists L] and If another vector can 
be formed, then control is passed to sub-step 1004 from where sub-steps 1004 to 1008 are 
repeated. 

If the processor 205 determines in sub-step 1008 that the Euclidean distance is 
5 below the threshold, then processor 205 saves in sub-step 1010 the information of the two 
possible candidate facial feature forming the vector as the left eye and right eye 
respectively- With the coordinates of the eyes now known, the processor 205 forms in 

sub-step 1012 a vector ^;c3 -^{x^ -x^ "^(^2 " from the centre of the eye-eye 
line to the mouth in the previous frame. A vector is also formed in sub-step 1014 from 

1 0 the centre of the current eye-eye line, which is at | — (jfj - )» :^ (y2 ~ J^i ) ] » to a possible 

\ 2 2 

candidate facial feature in list L3. 

The processor 205 then calculates in sub-step 1016 the Euclidean distance 
between the vectors formed in sub-steps 1012 and 1014 respectively. The processor 205 
then determines in sub-step 1018 whether the Euclidean distance is below a threshold. If 
15 the Euclidean distance is not below the threshold, then processor 205 determines in sub- 
step 1019 whether another vector can be formed in sub-step 1014 with a different possible 
candidate facial feature from lists Z3. If another vector can be formed, then control is 
passed to sub-step 1014 from where sub-steps 1014 to 1018 are repeated. 

If another vector can not be formed with a different combination of possible 
20 candidate facial feature from lists L3 then, in sub-step 1021, the processor 205 sets the 
coordinates (x^o^^jt+i to "unknown". Step 155 also ends after sub-step 1021. 

If the processor 205 determines in sub-step 1018 that the Euclidean distance is 
below the threshold, then processor 205 saves in sub-step 1020 the position (a:,, 3/,)' of the 
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candidate facial feature corresponding to the mouth as coordinates (X3, >'3)r+u together 
with its area A'i as area A3 and its eccentricity as eccentricity Step 155 also ends. 

If another vector can not be formed with a different combination of possible 
candidate facial features in lists L\ and Lz then, in sub-step 1042, the processor 205 forms 
5 a vector (xa-xi j;3->^i), from the left eye to the mouth in the previous frame. A vector is 
also formed in sub-step 1044 from a possible candidate facial feature in list Li to a 
possible candidate facial feature in list L3. 

The processor 205 then calculates in sub-step 1046 the Euclidean distance 
between the vectors formed in sub-steps 1042 and 1044 respectively. The processor 205 
10 then determines in sub-step 1048 whether the Euclidean distance is below a threshold. 

If the Euclidean distance is below the threshold, then processor 205 saves the the 
information of the two possible candidate facial feature forming the vector as the left eye 
and mouih respectively in sub-step 1050 before step 155 ends 

If the Euclidean distance is not below the threshold, then processor 205 
15 determines in sub-step 1049 whether another vector can be formed in sub-step 1044 with 
a different combination of possible candidate facial features in lists L\ and Ly If another 
vector can be formed, then control is passed to sub-step 1044 from where sub-steps 1044 
to 1048 are repeated. 

If another vector can not be formed with a different combination of possible 
20 candidate facial features in lists Xi and L3 then, in sub-step 1052, the processor 205 forms 
a vector (xy-xzo^^-yiit from the right eye to the mouth in the previous frame. A vector is 
also formed in sub-step 1054 from a possible candidate facial feature in list L2 to a 
possible candidate facial feature in list Ly. 

The processor 205 then calculates in sub-step 1056 the Euclidean distance 
25 between the vectors formed in sub-steps 1052 and 1054 respectively. The processor 205 
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then determines in sub-step 1058 whether the Euclidean distance is below a threshold. If 
the EucHdean distance is not below the threshold, then processor 205 determines in sub- 
step 1059 whether another vector can be formed in sub-step 1054 with a different 
combination of possible candidate facial features in lists Li and L3. If another vector can 
5 be formed, then control is passed to sub-step 1054 from where sub-steps 1054 to 1058 are 
repeated. 

If another vector can not be formed with a different combination of possible 
candidate facial features in lists L2 and Z3 then, in sub-step 1062, the possible candidate 
facial feature in lists Lu L2 and £3 with the smallest Euclidean distance diu da or rf/3 
10 (calculated in step 150) is assigned as corresponding to the corresponding feature by 
saving its information, while the coordinates of the other two features are set to 
"unknown". Step 1 55 then ends. 

If the processor 205 determines in sub-step 1058 that the Euclidean distance is 
below the threshold, then processor 205 saves the information of the two possible 
15 candidate facial features forming the vector as the right eye and mouth respectively. 

In summary, irrespective of which scenario exists, step 155 assigns one of the 
possible candidate facial features to each facial feature for tracking if possible, and 
attempts to recover facial features where appropriate using known geometric relations 
between facial features 

20 Referring again to Fig. 1, after performing step 155 the processor 205 determines 

in step 160 whether there are any more frames available. If no more frames are available 
for processing, then method 100 ends in step 180. 

If more frames are available for processing, then method 100 continues to step 
170 where a new frame is received by the processor 205. The new frame is added to the 
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block of pixel data, while the oldest frame in the block is removed from the block. The 
segmentation step 120 is again perfomfied on the updated block of pixel data. 

After the segmentation step 120 described with reference to Fig. 3 has been 
performed a first time, in subsequent execution of the segmentation step 120 the 3D 
5 segments 5, formed in a previous segmentation are maintained in sub-step 304. The 
pixels of the new frame are added to the data structure representing the current segment Si 
as new individual segments, each with model parameter a(x,y,n) set to the colour value 
^x,y,n)y and observing the correct adjacency relations with existing segments 5,. The 
effect of the segmentation step 120 is thus to merge the unsegmented pixels of the new 
10 frame into the existing 3D segments Si from a previous segmentation. However, the 
model parameter aix,y,n) of those existing segments 5/ from a previous segmentation 
may adjust due to the information contained in the new frame. 

The sequence of coordinates (xyj^)/ provides the positions of the features in each 
frame t. It is noted that the coordinates (xj^j)t are transformed back to the coordinate 
15 system of the frames before outputting the same. 

The foregoing describes only some embodiments of the present invention, and 
modifications and/or changes can be made thereto without departing from the scope and 
spirit of the invention, the embodiments being illustrative and not restrictive. The 
embodiment described tracks the facial features of a single face, but it would be 
20 understood by those skilled in the art that the facial features of any number of faces may 
be tracked in the same manner. 

In the context of this specification, the word "comprising" means "including 
principally but not necessarily solely" or "having" or "including", and not "consisting 
only of \ Variations of the word "comprising", such as "comprise" and "comprises" have 
25 correspondingly varied meanings. 



593973.doc 



-36- 

The claims defining the invention are as follows: 

1. A method of tracking facial features in a video sequence, said method 
comprising the steps of: 

5 (a) receiving positions of facial features for tracking in a first frame of said video 

sequence; 

(b) spatiotemporally segmenting said video sequence to provide a sequence of 
associated two-dimensional segments, a first segment in said sequence of associated two- 
dimensional segments including said facial features for tracking; 
10 (c) identifying candidate facial features in at least a second two-dimensional 

segment in said sequence of associated segments; and 

(d) verifying which of said candidate facial features correspond with said facial 
features for tracking. 

15 2. A method as claimed in claim 1 comprising the fiirther step of: 

(e) recovering lost facial features by using known geometric relations between 
facial features. 

3, A method as claimed in claim 1 or 2 wherein step (c) comprises the sub-steps 

20 of: 

(ci) forming a sub-image including said two-dimensional segment in said 
sequence of associated segments; 

(cii) normalising the size of said sub-image; and 

(ciii) identifying candidate facial features in said normalised sub-image. 
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4. A method as claimed in any one of claims 1 to 3 wherein step (d) measures 
the correspondence between said candidate facial features and said facial features for 
tracking. 

4. A method as claimed in claim 4 wherein step (d) comprises determining 
whether said candidate facial features are within a region of each position of said facial 
features for tracking in a previous frame. 

5. A method as claimed in claim 4 wherein step (d) further comprises 
determining whether said candidate facial features within each of said regions that are 
similar in shape to said facial features for tracking in said previous frame. 

7. A method of tracking facial features in a sequence of associated two- 
dimensional segments, a first segment in said sequence of associated two-dimensional 
segments including facial features for tracking, said method comprising the steps of: 

(a) identifying candidate facial features in at least a second two-dimensional 
segment in said sequence of associated segments; 

(b) verifying which of said candidate facial features correspond with said facial 
features for tracking; and 

(c) recovering lost facial features by using known geometric relations between 
facial features. 

8. A method as claimed in claim 7 wherein step (a) comprises the further sub- 
steps of: 
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(ai) forming a sub-image including said two-dimensional segment in said 
sequence of associated segments; 

(aii) normalising the size of said sub-image; and 

(aiii) identifying candidate facial features in said normalised sub-image. 

5 

9. A method as claimed in any one of claims 7 to 8 wherein step (b) measures 
the correspondence between said candidate facial features and said facial features for 
tracking. 

10 10. A method as claimed in claim 9 wherein sub-step (b) comprises 

determining whether said candidate facial features are within a region of each position of 
said facial features for tracking in a previous frame. 

11. A method as claimed m claim 10 wherein sub-step (b) further comprises 
15 determining whether said candidate facial features within each of said regions that are 

similar in shape to said facial features for tracking in said previous frame. 

12. An apparatus for tracking facial features in a video sequence, said 
apparatus comprising: 

20 means for receiving positions of facial features for tracking in a first frame of 

said video sequence; 

means for spatiotemporally segmenting said video sequence to provide a 
sequence of associated two-dimensional segments, a first segment in said sequence of 
associated two-dimensional segments including said facial features for tracking; 
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means for identifying candidate facial features in at least a second two- 
dimensional segment in said sequence of associated segments; and 

means for verifying which of said candidate facial features correspond with said 
facial features for tracking. 

13. An apparatus as claimed in claim 12 further comprising: 
means for recovering lost facial features by using known geometric relations 
between facial features. 



10 14. An apparatus as clauned in claim 12 or 13 wherein said means for 

identifying comprises: 

means for forming a sub-image including said two-dimensional segment in said 
sequence of associated segments; 

means for normalising the size of said sub-image; and 
15 means for identifying candidate facial features in said normalised sub-image. 

15. An apparatus for tracking facial features in a sequence of associated 
two-dimensional segments, a first segment in said sequence of associated two- 
dimensional segments including facial features for tracking, said apparatus comprising: 
20 means for identifying candidate facial features in at least a second two- 

dimensional segment in said sequence of associated segments; 

means for verifying which of said candidate facial features correspond with said 
facial features for tracking; and 

means for recovering lost facial features by using known geometric relations 
25 between facial features. 
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16. An apparatus as claimed in claim 15 wherein said means for identifying 
comprises: 

means for forming a sub-image including said two-dimensional segment in said 
5 sequence of associated segments; 

means for normalising the size of said sub-image; and 

means for identifying candidate facial features in said normalised sub-image. 



17. A program stored on a memory medium for tracking facial features in a 
10 video sequence, said program comprising: 

code for receiving positions of facial features for tracking in a first frame of said 

video sequence; 

code for spatiotemporally segmenting said video sequence to provide a sequence 
of associated two-dimensional segments, a first segment in said sequence of associated 
15 two-dimensional segments including said facial features for tracking; 

code for identifying candidate facial features in at least a second two- 
dimensional segment in said sequence of associated segments; and 

code for verifying which of said candidate facial features correspond with said 
facial features for tracking. 

20 

18. A program as claimed in claim 17 further comprising: 

code for recovering lost facial features by using known geometric relations 
between facial features. 
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19. A program as claimed in claim 17 or 18 wherein said code for 
identifying comprises: 

code for forming a sub-image including said two-dimensional segment in said 
sequence of associated segments; 

code for normalising the size of said sub-image; and 

code for identifying candidate facial features in said nomialised sub-image. 

20. A program stored on a memory medium for tracking facial features in a 
sequence of associated two-dimensional segments, a first segment in said sequence of 
associated two-dimensional segments including facial features for tracking, said program 
comprising: 

code for identifying candidate facial features in at least a second two- 
dimensional segment in said sequence of associated segments; 

code for verifying which of said candidate facial features correspond with said 
facial features for tracking; and 

code for recovering lost facial features by using known geometric relations 
between facial features. 

21. A program as claimed in claim 20 wherein said code for identifying 
comprises: 

code for forming a sub-image including said two-dimensional segment in said 
sequence of associated segments; 

code for normalising the size of said sub-image; and 

code for identifying candidate facial features in said normalised sub-image. 
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22. A method of tracking facial features in a video sequence, said method 
being substantially as herein described with reference to the accompanying drawings. 

23. An apparatus for tracking facial features in a video sequence, said 
5 apparatus being substantially as herein described with reference to the accompanying 

drawings. 

24. A program for tracking facial features in a video sequence, said program 
being substantially as herein described with reference to the accompanying drawings. 

10 

DATED this 1 8^*" Day of September 2002 
CANON KABUSHIKI KAISHA 

Patent Attorneys for the Applicant 
SPRUSON&FERGUSON 
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